A boundedness result for the direct heuristic dynamic programming
نویسندگان
چکیده
Approximate/adaptive dynamic programming (ADP) has been studied extensively in recent years for its potential scalability to solve large state and control space problems, including those involving continuous states and continuous controls. The applicability of ADP algorithms, especially the adaptive critic designs has been demonstrated in several case studies. Direct heuristic dynamic programming (direct HDP) is one of the ADP algorithms inspired by the adaptive critic designs. It has been shown applicable to industrial scale, realistic and complex control problems. In this paper, we provide a uniformly ultimately boundedness (UUB) result for the direct HDP learning controller under mild and intuitive conditions. By using a Lyapunov approach we show that the estimation errors of the learning parameters or the weights in the action and critic networks remain UUB. This result provides a useful controller convergence guarantee for the first time for the direct HDP design.
منابع مشابه
Extracting Dynamics Matrix of Alignment Process for a Gimbaled Inertial Navigation System Using Heuristic Dynamic Programming Method
In this paper, with the aim of estimating internal dynamics matrix of a gimbaled Inertial Navigation system (as a discrete Linear system), the discretetime Hamilton-Jacobi-Bellman (HJB) equation for optimal control has been extracted. Heuristic Dynamic Programming algorithm (HDP) for solving equation has been presented and then a neural network approximation for cost function and control input ...
متن کاملNew scheduling rules for a dynamic flexible flow line problem with sequence-dependent setup times
In the literature, the application of multi-objective dynamic scheduling problem and simple priority rules are widely studied. Although these rules are not efficient enough due to simplicity and lack of general insight, composite dispatching rules have a very suitable performance because they result from experiments. In this paper, a dynamic flexible flow line problem with sequence-dependent se...
متن کاملA Hybrid Dynamic Programming for Inventory Routing Problem in Collaborative Reverse Supply Chains
Inventory routing problems arise as simultaneous decisions in inventory and routing optimization. In the present study, vendor managed inventory is proposed as a collaborative model for reverse supply chains and the optimization problem is modeled in terms of an inventory routing problem. The studied reverse supply chains include several return generators and recovery centers and one collection...
متن کاملBoundedness of KKT Multipliers in fractional programming problem using convexificators
‎In this paper, using the idea of convexificators, we study boundedness and nonemptiness of Lagrange multipliers satisfying the first order necessary conditions. We consider a class of nons- mooth fractional programming problems with equality, inequality constraints and an arbitrary set constraint. Within this context, define generalized Mangasarian-Fromovitz constraint qualification and sh...
متن کاملBoundedness of iterates in Q-Learning
Reinforcement Learning (RL) is a simulation-based counterpart of stochastic dynamic programming. In recent years, it has been used in solving complex Markov decision problems (MDPs). Watkins’ Q-Learning is by far the most popular RL algorithm used for solving discounted-reward MDPs. The boundedness of the iterates in Q-Learning plays a critical role in its convergence analysis and in making the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Neural networks : the official journal of the International Neural Network Society
دوره 32 شماره
صفحات -
تاریخ انتشار 2012